Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Implement refreshCredentialsIfRequired for intermediate token r… #1583

Draft
wants to merge 1 commit into
base: client-side-cab
Choose a base branch
from

Conversation

nbayati
Copy link

@nbayati nbayati commented Nov 23, 2024

This PR implements refreshCredentialsIfRequired which will be called by generateToken(). The refresh logic was implemented based on the design doc, which takes into account the refreshMargin and minimumTokenLifetime, and decides on either async or synchronous refresh if one is needed.

Notes:

  • This PR merges the changes to a feature branch and not the main branch.
  • Unit tests to follow. 

…efresh

Implement `refreshCredentialsIfRequired`, called by `generateToken()`, to handle token refresh. It uses `refreshMargin` and `minimumTokenLifetime` to decide on synchronous or asynchronous refresh
@product-auto-label product-auto-label bot added the size: m Pull request size is medium. label Nov 23, 2024
Copy link

sonarcloud bot commented Nov 23, 2024

Quality Gate Failed Quality Gate failed

Failed conditions
0.0% Coverage on New Code (required ≥ 80%)

See analysis details on SonarQube Cloud

private static final Duration DEFAULT_MINIMUM_TOKEN_LIFETIME = Duration.ofMinutes(3);
private final Object refreshLock = new Object[0]; // Lock for refresh operations
@Nullable private SettableFuture<Void> currentRefreshFuture;
private final ExecutorService backgroundExecutor = Executors.newSingleThreadExecutor();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't being lifecycle managed. If we create this, then we'll need to close this as well.

To keep with the existing Auth flow, let's use MoreExecutors.directExecutor()


// The STS endpoint will only return the expiration time for the intermediary token
synchronized (refreshLock) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

qq, why do we need a lock here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From comment thread in the design doc:

We still need to synchronize access to the intermediary token and session key. Concurrent access by the background refresh task and main user thread could lead to inconsistencies. Because the background refresh task updates these values separately, there's a chance the main thread could grab a new token with an old key, causing problems.

Comment on lines +177 to +196
synchronized (refreshLock) {
if (currentRefreshFuture != null && !currentRefreshFuture.isDone()) {
try {
currentRefreshFuture.get(); // Wait for the asynchronous refresh to complete.
} catch (InterruptedException e) {
Thread.currentThread().interrupt(); // Restore the interrupt status
throw new IOException("Interrupted while waiting for asynchronous refresh.", e);
} catch (ExecutionException e) {
Throwable cause = e.getCause(); // Unwrap the underlying cause
if (cause instanceof IOException) {
throw (IOException) cause;
} else {
throw new IOException("Asynchronous refresh failed.", cause);
}
}
} else {
// No asynchronous refresh is running, perform a synchronous refresh.
refreshCredentials();
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit concerned with this. I don't think we should be holding a lock over a network call (refreshCredentials) or while waiting for the future (currentRefreshFuture.get()).

Is there a reason why this needs to obtain the lock?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not used to async programming so there's a good chance I've over-used the lock! My thought process was that we need to obtain the lock before checking/setting the value of currentRefreshFuture because it can be updated by another thread. We could only hold the lock while checking the value though, something like:

    Boolean isAsyncInProgress = false;
    synchronized (refreshLock) {
      isAsyncInProgress = currentRefreshFuture != null && !currentRefreshFuture.isDone();
    }
    if (isAsyncInProgress) {
      try {
        ... 
      }
    } else {
      // No asynchronous refresh is running, perform a synchronous refresh.
      refreshCredentials();
    }
  }

Would this address your concern?

private static final Duration DEFAULT_REFRESH_MARGIN = Duration.ofMinutes(30);
private static final Duration DEFAULT_MINIMUM_TOKEN_LIFETIME = Duration.ofMinutes(3);
private final Object refreshLock = new Object[0]; // Lock for refresh operations
@Nullable private SettableFuture<Void> currentRefreshFuture;
Copy link
Contributor

@lqiu96 lqiu96 Nov 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs to marked with volatile


private void startAsynchronousRefresh() {
// Obtain the lock before checking or modifying currentRefreshFuture to prevent race conditions.
synchronized (refreshLock) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

qq, why do we need to obtain the lock here. I think for this it's probably fine for async, but I'm not putting together why it's needed.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lock here prevents multiple concurrent asynchronous refreshes from starting. Without the lock, the following scenario could occur:

  1. Thread A checks currentRefreshFuture and finds it null or done.
  2. Before Thread A can assign a new future to currentRefreshFuture, Thread B also checks and finds it null or done.
  3. Both Thread A and Thread B start a refresh, wasting resources and potentially leading to inconsistencies if they complete out of order and overwrite each other's results.

The lock ensures that only one asynchronous refresh can be in progress at any given time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I think I understand. I think there are a few things regarding this.

  1. When either an async or sync call is made, we really only want one execution/ call to go through at a time. Even if refreshCredentialsIfRequired() is called repeatedly, then one call really should be made. For that, I believe we would want functionality similar to this. It will find the existing execution or create a new one as a transaction. If multiple calls are made at the same time, each of the calls will find the same refresh action while it's still active.
  2. For the sync flow, we should kick off the refresh action here via the createRefreshTask linked above and block until the future is resolved via blockingRefresh().
  3. For the async flow, it should just create a refresh action via createRefreshTask

Comment on lines +67 to +68
private static final Duration DEFAULT_REFRESH_MARGIN = Duration.ofMinutes(30);
private static final Duration DEFAULT_MINIMUM_TOKEN_LIFETIME = Duration.ofMinutes(3);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible for DEFAULT_REFRESH_MARGIN to be less than DEFAULT_MINIMUM_TOKEN_LIFETIME? Do we need any validation for user configurations?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size: m Pull request size is medium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants